Privacy and Anonymity in Graph Data
نویسندگان
چکیده
Anonymization of datasets is an important problem in many different scenarios: the census bureau publishes anonymized information, hospitals want to make anonymized patient records available to health researchers, or network service provides might want to publish network traces. Anonymization techniques have have been investigated widely in the past few years [Sweeney, 2002; Machanavajjhala et al., 2006], but the focus of this work has been on the anonymization of one single table, where there is a 1:1 correspondence between tuples and real-world individuals, and tuples can be treated independently. We consider datasets that cannot be modelled adequately within the single table framework. As soon as the database contains information about the interaction of real-world entities, tuples in the database are no longer independent, and convential anonymization techniques fail for several reasons. This is in fact the case for many interesting datasets: email communication traces, social networks and data sharing in the intelligence community are only a few examples. There are two fundamentally different approaches to prevent the publication of private information about individuals. If the database is stored on a central server, and all authorized parties access the data through this server, then we can add noise to query answers and thus prevent an adversary from learning too much information about individuals. This approach does not work well when the information should be made available to the general public, or the data owner does not want to maintain the necessary infrastructure. Furthermore, collusion of parties and the time-evolution of the database is not solved satisfactorily in this setting. A second and more general approach is to anonymize the database once and then publish the anonymized version. Because here we do not know anything about the specific queries that are run on the published database, we have to give privacy guarantees with respect to all possible queries. In this report, we investigate the problems associated with the second approach in a graph data context. In the section 2, we introduce the Enron email dataset as an example that is used throughout the rest of this report. A formal framework for anonymization and disclosure analysis is
منابع مشابه
An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملImproved Univariate Microaggregation for Integer Values
Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...
متن کاملQuality Aware Privacy Protection for Location-Based Services
Protection of users’ privacy has been a central issue for location-based services (LBSs). In this paper, we classify two kinds of privacy protection requirements in LBS: location anonymity and identifier anonymity. While the location cloaking technique under the k-anonymity model can provide a good protection of users’ privacy, it reduces the resolution of location information and, hence, may d...
متن کاملStructural Diversity for Privacy in Publishing Social Networks
How to protect individual privacy in public data is always a concern. For social networks, the challenge is that, the structure of the social network graph can be utilized to infer the private and sensitive information of users. The existing anonymity schemes mostly focus on the anonymity of vertex identities, such that a malicious attacker cannot associate an user with a specific vertex. In re...
متن کاملDe-SAG: On the De-anonymization of Structure-Attribute Graph Data
In this paper, we study the impacts of non-Personal Identifiable Information (non-PII) on the privacy of graph data with attribute information (e.g., social networks data with users’ profiles (attributes)), namely Structure-Attribute Graph (SAG) data, both theoretically and empirically. Our main contributions are two-fold: (i) we conduct the first attribute-based anonymity analysis for SAG data...
متن کاملk - RDF-Neighbourhood Anonymity: Combining Structural and Attribute-based Anonymisation for Linked Data
We provide a new way for anonymising a heterogeneous graph containing personal identifiable information. The anonymisation algorithm is called k− RDF-neighbourhood anonymity, because it changes the one hoop neighbourhood of at least k persons inside an RDF graph so that they cannot be distinguished. This enhances the privacy of persons represented in the graph. Our approach allows us to control...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006